Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping
نویسندگان
چکیده
The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity. 2) Regardless of size, nets learn task subcomponents in similar sequence. Big nets pass through stages similar to those learned by smaller nets. Early stopping can stop training the large net when it generalizes comparably to a smaller net. We also show that conjugate gradient can yield worse generalization because it overfits regions of low non-linearity when learning to fit regions of high non-linearity.
منابع مشابه
Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is ...
متن کاملOverfitting and Neural Networks: Conjugate Gradient and Backpropagation
Methods for controlling the bias/variance tradeoff typically assume that overfitting or overtraining is a global phenomenon. For multi-layer perceptron (MLP) neural networks, global parameters such as the training time (e.g. based on validation tests), network size, or the amount of weight decay are commonly used to control the bias/variance tradeoff. However, the degree of overfitting can vary...
متن کاملEarly Stopping as Nonparametric Variational Inference
We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy over these distributions during optimization, we form a scalable, unbiased estim...
متن کاملAn Efficient Optimization Method for Extreme Learning Machine Using Artificial Bee Colony
Traditional learning algorithms with gradient descent based technique, such as back-propagation (BP) and its variant Levenberg-Marquardt (LM) have been widely used in the training of multilayer feedforward neural networks. The gradient descent based algorithm may converge usually slower than required time in training, since many iterative learning step are needed by such learning algorithm, and...
متن کاملA Study on Neural Network Training Algorithm for Multiface Detection in Static Images
This paper reports the study results on neural network training algorithm of numerical optimization techniques multiface detection in static images. The training algorithms involved are scale gradient conjugate backpropagation, conjugate gradient backpropagation with Polak-Riebre updates, conjugate gradient backpropagation with Fletcher-Reeves updates, one secant backpropagation and resilent ba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000